Itemset generalization with cardinality-based constraints

نویسندگان

  • Luca Cagliero
  • Paolo Garza
چکیده

Generalized itemset mining is an established data mining technique that focuses on discovering high-level correlations among large databases. By exploiting a taxonomy built over the data items, items are aggregated into higher level concepts and, thus, data correlations at different abstraction levels can be discovered. However, since a large number of patterns can be extracted, the result of the mining process is often not easily manageable by domain experts. We propose a novel approach to discovering a compact subset of generalized itemsets from structured data. To guarantee model conciseness and readability, a set of itemsets that has a common generalization is generated only when its cardinality is so small that its manual inspection is practically feasible. Furthermore, generalizations are generated only when their knowledge is covered by a large number of low-level descendant itemsets, and the generalizations are worth considering in place of their many low-level descendants only in these cases. ∗Corresponding author. Tel.: +39 011 090 7084. Fax: +39 011 090 7099. Email addresses: [email protected] (Luca Cagliero), [email protected] (Paolo Garza) Preprint submitted to Experiments performed on synthetic, benchmark, and real data taken from a mobile application scenario demonstrate the effectiveness and efficiency of the proposed approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CS 730R: Topics in Data and Information Management

1. Summary. In this paper the authors propose a differentially privacy preserving algorithm for mining frequent itemset. This work differs from the other privacy preserving miners present in literature, indeed this algorithm mines the itemset by enforcing cardinality constraints on the transactions present in the dataset. In particular the authors study how the reduction the cardinality of the ...

متن کامل

Towards an Enhanced Semantic Approach Based on Formal Concept Analysis and Lift Measure

The volume of stored data increases rapidly. Therefore, the battery of extracted association heavily prohibits the better support of the decision maker. In this context, backboned on the Formal Concept Analysis, we propose to extend the notion of Formal Concept through the generalization of the notion of itemset aiming to consider the itemset as an intent, its support as the cardinality of the ...

متن کامل

An Improvement of Optimal Couverture Extraction Using Semantic Approach

The amount of data speedily proliferates. Consequently, the excessive number of extracted association greatly prohibits to better assist the decision maker. In this respect, backboned on the Formal Concept Analysis, we propose to extend the notion of Formal Concept through the generalization of the notion of itemset aiming to consider the itemset as an intent, its support as the cardinality of ...

متن کامل

Formal Concept Analysis Based Association Rules Extraction

Generating a huge number of association rules reduces their utility in the decision making process, done by domain experts. In this context, based on the theory of Formal Concept Analysis, we propose to extend the notion of Formal Concept through the generalization of the notion of itemset in order to consider the itemset as an intent, its support as the cardinality of the extent and its releva...

متن کامل

DisClose : discovering colossal closed itemsets from high dimensional datasets via a compact row-tree

Data mining is an essential part of knowledge discovery, and performs the extraction of useful information from a collection of data, so as to assist human beings in making necessary decisions. This thesis describes research in the field of itemset mining, which performs the extraction of a set of items that occur together in a dataset, based on a user specified threshold. Recent focus of items...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Inf. Sci.

دوره 244  شماره 

صفحات  -

تاریخ انتشار 2013